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Abstract. We propose a method to construct a proposal density for the 
Metropolis-Hastings algorithm in Markov Chain Monte Carlo (MCMC) 
simulations of the GARCH model. The proposal density is constructed 
adaptively by using the data sampled by the MCMC method itself. It 
turns out that autocorrelations between the data generated with our 
adaptive proposal density are greatly reduced. Thus it is concluded that 
the adaptive construction method is very efficient and works well for the 
MCMC simulations of the GARCH model. 
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1 Introduction 

It is well known that financial time series of asset returns show various interesting 
properties which can not be explained from the assumption that the time series 
obeys the Brownian motion. Those properties are classified as stylized factspQ. 
Some examples of the stylized facts are (i) fat-tailed distribution of return (ii) 
volatility clustering (iii) slow decay of the autocorrelation time of the absolute 
returns. The true dynamics behind the stylized facts is not fully understood. 
There are some attempts to make physical models based on spin dynamics [2] -[3] 
and they are able to capture some of the stylized facts. 

In finance volatility is an important quantity to measure risk. To forecast 
volatility, various empirical models to mimic the properties of the volatility 
have been proposed. In 1982 EnglefTU] proposed Autoregressive Conditional Het- 
eroskedasticity (ARCH) model where the present volatility is assumed to depend 
on squares of the past observations. Later BollerslevjTl] proposed Generalized 
ARCH (GARCH) model which includes additional past volatility terms to the 
present volatility estimate. 

A conventional approach to infer GARCH model parameters is the Maxi- 
mum Likelihood (ML) estimation where the GARCH parameters are obtained 
as the values which maximaize the likelihood function of the GARCH model. 
The maximization of the likelihood function can be done by the maximization 
tool available in computer libraries. A practical difficulty of the maximization 
procedure is that the output results are often sensitive to starting values. 
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An alternative approach, which recently becomes popular, is the Bayesian 
inference. Usually the Bayesian inference procedure is performed by MCMC 
methods. There is no unique way to implement MCMC methods. So far a vari- 
ety of methods to MCMC procedure have been developed [2]- [TS]. In a recent 
survey [T%] it is shown that Acceptance- Rejection/Metropolis-Hastings (AR/MH) 
algorithm works better than other algorithms. In the AR/MH algorithm the 
proposal density is assumed to be a multivariate Student's t-distribution and its 
parameters are estimated by the ML technique. Here we develop a method to 
determine parameters of a multivariate Student's t-distribution, which does not 
rely on the ML method. In our method the proposal density is also assumed to be 
a multivariate Student's t-distribution but the parameters are determined by an 
MCMC simulation. During the MCMC simulation, the parameters are updated 
adaptively using the data generated so far. We test our method using artificial 
GARCH data and show that the method substantially reduces the correlations 
between the sampled data and works well for GARCH parameter estimations. 



2 GARCH Model 

The GARCH(p,q) model to the time series data yt is given by 

Vt = cr t e t , (1) 

q p 

°t=u + J2 <*ivi-i + J2 > ( 2 ) 

»=1 i=l 

where u> > 0, Oj > and j3i > to ensure a positive volatility. Furthermore the 
stationary condition given by 

q p 

5>i+^A<l (3) 

i=l i=l 

is also required, et is an independent normal error ~ N(0, 1). In many empirical 
studies it is shown that (p = q = 1) GARCH model well captures the properties 
of the financial time series volatility. Thus in this study we use GARCH(1,1) 
model for our testbed. The volatility of of the GARCH model is now written as 

<j\ = u + ay 2 ^ +/9<J t 2 _ 1 , (4) 

where a, (3 and u> are the parameters to be estimated. 

Let 9 = (uj,a,/3) be a parameter set of the GARCH model. The likelihood 
function of the GARCH model is written as 

L( y |0)=77f =1 -=L=exp(-4). (5) 
V27T07 a t 

This function plays a central role in ML estimations and also for the Bayesian 
inference. 
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3 Bayesian inference 



In this section we briefly describe the Bayesian inference which estimates the 
GARCH parameters numerically by using the MCMC method. From the Bayes' 
theorem the posterior density n(9\y) with data y — (y 1 ,y 2 , ■ ■ ■ , y n ) is given by 

tt(%) cx L(y\6)ir(9), (6) 

where L(y\9) is the likelihood function. ir(6) is the prior density for 9. The 
functional form of tt(9) is not known a priori. Here we assume that the prior 
density ir(9) is constant. vr(0|y) gives a probability distribution of 9 when the 
data y are given. 

With this ir{9\y) values of the parameters are inferred as the expectation 
values of 9 given by 

(9) = \ j 9n(9\y)d9, (7) 

where 

\{9\y)d9. (8) 
Z is a normalization constant irrelevant to MCMC estimations. 



3.1 MCMC 



In general the integral of eq. ([7]) can not be performed analytically. The MCMC 
technique gives a method to estimate eq.([7|) numerically. The basic procedure of 
the MCMC method is as follows. First we sample 9 drawn from the probability 
distribution Tr(9\y). Sampling is done by a technique which produces a Markov 
chain. After sampling some data, we obtain the expectation value as an average 
value over the sampled data 0« = (9 ( - 1 \ . . . , 9^), 

i=l 

where k is the number of the sampled data. The statistical error for k indepen- 
dent data is proportional to ^4=. In general, however, the data generated by the 
MCMC method are correlated. As a result the statistical error will be propor- 
tional to where r is the autocorrelation time between the sampled data. 
The autocorrelation time depends on the MCMC method we employ. Thus it is 
desirable to choose an MCMC method generating data with a small r. 



3.2 Metropolis-Hastings algorithm 

The most general and simple method to draw values from a given probability 
distribution is the Metropolis method[T^] or its generalized version, Metropolis- 
Hastings method [IB] . Let P{x) is a probability distribution from which data x 
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shall be sampled. First starting from x, we propose a candidate x' which is drawn 
from a certain probability distribution g(x'\x) which we call proposal density. 
Then we accept the candidate x with a probability Pmh(x, x') as the next value 
of the Markov chain: 



Pmh{x,x') 



f, 



P(x') g(x\x') 



P(x) g{x'\x) 



(10) 



If x' is rejected we keep the previous value x. 

When g(x\x') = g(x'\x), eq. lflT))) reduces to the Metropolis accept probability: 



PMetro{x,x) = min 



1. 



P(x) 



(11) 



4 Adaptive construction of proposal density 



Disadvantages of the MH method are that the candidate drawn as the next value 
is not always accepted and in general the data sampled by the Markov chain are 
correlated, which results in increasing statistical errors. 

If the proposal density is close enough to the posterior density the acceptance 
in the MH method can be high. The posterior density of GARCH parameters 
often resembles to a Gaussian-like shape. Thus one may choose a density similar 
to a Gaussian distribution as the proposal density. Following [T71 HH] , in order 
to cover the tails of the posterior density we use a (p-dimensional) multivariate 
Student's t-distribution given by 



r((z,+p)/2)/J>/2) 



det r 1 /2(„ 7r )p/2 
where 9 and M are column vectors 



1 



M) t S^ 1 (6 - M) 



>+p)/2 



(12) 





'01~ 




'Mr 








M 2 






,M = 






_9 P _ 




_M P _ 



(13) 



and Mi = E(0i). 2J is the covariance matrix defined as 



v-2 



E[{6 - M){6 - Mf 



(14) 



v is a parameter to tune the shape of Student's t-distribution. When v — > oo the 
Student's t-distribution goes to a Gaussian distribution. At small v Student's 
t-distribution has a fat-tail. 

For our GARCH model p = 3 and 6 = (0i,6 2 ,O 3 ) = (a,f3,co), and thus 
17 is a 3 x 3 matrix. We determine these unknown parameters M and S by 
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MCMC simulations. First we make a short run by the Metropolis algorithm and 
accumulate some data. Then we estimate M and E from the data. Note that 
there is no need to estimate M and E accurately. Second we perform an MH 
simulation with the proposal density of eq. (fT2")) with the estimated M and E. 
After accumulating more data, we recalculate M and E, and update M and 
E of eq . (|12|) . By doing this, we adaptively change the shape of eq. p2[l to fit 
the posterior density. We call eq. (fT^|) with the estimated M and E "adaptive 
proposal density" . 

5 Numerical simulations 

In order to test the adaptive construction method we use artificial GARCH data 
generated with a known parameter set and try to infer the parameters of the 
GARCH model from the artificial GARCH data. The GARCH parameters are 
set to a = 0.1, (3 = 0.8 and ui = 0.1. Then using these parameters we generated 
2000 data. For this artificial data we perform MCMC simulations by the adaptive 
construction method. 

Implementation of the adaptive construction method is as follows. First we 
start a run by the Metropolis algorithm. The first 3000 data are discarded as 
burn-in process or in other words thermalization. Then we accumulate 1000 data 
for M and E estimations. The estimated M and E are substituted to g{9) of 
cq. (fl"2"j) . We re-start a run by the MH algorithm with the proposal density g{9). 
Every 1000 update we re-calculate M and E and update g{9). We accumulate 
199000 data for analysis. To check v parameter dependence on the MCMC esti- 
mations we use v = (4, 6, 8, 10, 12, 20) and perform the same MCMC simulation 
for each v. Later we find that v dependence on the MCMC results is weak. 
Therefore the results from ^ = 10 simulations will be mainly shown. 



Table 1. Results of parameters. 





a 







Adaptive [y = 10) 


0.10374 


0.7789 


0.11532 


standard deviation 


0.019 


0.045 


0.034 


statistical error 


0.00006 


0.0002 


0.00014 


2r 


2.3 ±0.2 


3.0 ±0.3 


3.4 ±0.8 


Metropolis 


0.1033 


0.7797 


0.1149 


standard deviation 


0.019 


0.045 


0.034 


statistical error 


0.0005 


0.0017 


0.0012 


2r 


440 ± 90 


900 ± 190 


830 ± 170 



For comparison we also make a Metropolis simulation and accumulate 600000 
data for analysis. In this study the Metropolis algorithm is implemented as 
follows. We draw a candidate 9' by adding a small random value 89 to the 
present value 9: 
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6' = 9 + 59, (15) 

where 69 = d(r — 0.5). r is a uniform random number in [0, 1] and d is a constant 
to tune the Metropolis acceptance. We choose d so that the acceptance becomes 
greater than 50%. 




10000 12000 14000 10000 12000 14000 



MC time MC time 

Fig. 1. Monte Carlo histories of a from the adaptive construction method with v = 
10(left) and the Metropolis algorithm (right). 

Fig. 1 compares the Monte Carlo history of a generated by the adaptive 
construction method with that by the Metropolis algorithm. It is clearly seen 
that the data a generated by the Metropolis algorithm are very correlated. For 
other parameters (3 and uu we also see similar behavior. 

To quantify the correlation we measure the autocorrelation function (ACF). 
The ACF of certain successive data x is defined by 

Tf E^i < x >)(x(j + t)- < x >) 
ACF(t) = N ^~ iy yjJ Jl^l '. i, (16) 

where < x > and are the average value and the variance of x respectively. 

Fig. 2 shows the ACF for the adaptive construction method and the Metropo- 
lis algorithm. The ACF of the adaptive construction method decreases quickly 
as Monte Carlo time t increases. On the other hand the ACF of the Metropolis 
algorithm decreases very slowly which indicates that the correlation between the 
data is very large. 

Using the ACF, the autocorrelation time r is calculated as 

T =2+5> CFW - (17) 

i=l 

Results of t are summarized in Table 1. The values of r from the Metropolis 
simulations are very large, typically several hundreds. On the other hand we see 
very small correlations, 2r ~ 2 — 3 for the adaptive construction method. Thus 
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the adaptive construction method works well for reducing correlations between 
the sampled data. 

We examine how the covariance matrix E varies during the simulations. Here 
let us define a symmetric matrix V as 

V = E[{6 - M)(6 - M)\ (18) 

and 8 = (61,82,03) = (a,f3,u>). Instead of E, we analys this V since V should 
be same for all v and it is easy to see the convergence property. 

In Fig. 3 we show how Vn(= V aa ) and V2%(= V^ w ) change as the simulations 
are proceeded. We see that Vn and V23 converge to some values. We also find 
similar behavior for other Vij . The final output of the matrix elements of V from 
the simulations is as follows. 

/ 3.6 x 1(T 4 -5.8 x 1CT 4 2.6 x 1(T 4 \ 
V = -5.8 x 1(T 4 2.1 x lfr 3 -1.4 x 10" 3 . (19) 
\ 2.6 x 10" 4 -1.4 x 10" 3 1.2 x 10" 3 / 
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From this result we find that Vi2(= V a p) and V2z{— Vp w ) are negative, and 
V\${= V auJ ) is positive. Fig. 4 also displays these correlation properties. 




'o 0.2 0.4 0.6 0.8 



Fig. 4. Scatter plot of sampled data for (a,/3), {/3,u)) and (a,ui). 




75 
(xlOOO) 



Fig. 5. Acceptance at MH step with the adaptive proposal density. 



Fig. 5 shows values of the acceptance at the MH algorithm with the adaptive 
proposal density of eq. (T2j) . Each acceptance is calculated every 1000 updates 
and the calculation of the acceptance is based on the latest 1000 data. At the first 
stage of the simulation the acceptance is low because M and E are not calculated 
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accurately as shown in Fig. 3. However the acceptances increase quickly as the 
simulations are proceeded and reaches plateaus. Typically the acceptances are 
more than 70% except for v — 4. Probably v = 4 proposal density is less efficient 
because the tail of the proposal density is too heavy to cover the tail of the 
posterior density. 
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Fig. 6. Results of the GARCH parameters estimated by MCMC methods. The results 
at v — 40 are from the Metropolis simulations. The error bars show statistical errors 
only and they are calculated by the jackknife method. 
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Fig. 6 shows results of the GARCH parameters estimated by the MCMC 
methods. The values of the GARCH parameters are summarized in Table 1. 
The results from the adaptive construction method have much smaller error 
bars and are consistent each other. On the other hand the Metropolis results 
have larger error bars although the number of the sampled data is larger than 
that of the adaptive construction method. This is because the data sampled 
by the Metropolis algorithm are long-correlated. The all results with standard 
deviations agree with the input GARCH parameters (a — 0.1, j3 — 0.8 and 
lo = 0.1). This indecates that the MCMC estimations are correctly done. 
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6 Summary 

We proposed a method to construct a proposal density used in the MH algo- 
rithm. The construction of the proposal density is performed using the data 
generated by MCMC methods. During the MCMC simulations the proposal 
density is updated adaptively. The numerical results show that the adaptive 
construction method significantly reduces the correlations between the sampled 
data. The autocorrelation time of the adaptive construction method is calculated 
to be 2r ~ 2 — 3. This autocorrelation time is similar to that of the AR/MH 
method [TH] which uses the ML estimation. Thus the efficiency of the adaptive 
construction method is comparable to that of the AR/MH method. This is not 
surprising because both methods construct the essentially same proposal den- 
sity in different ways. Therefore the adaptive construction method serves as an 
alternative efficient method for GARCH parameter inference without using ML 
estimations. 
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